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Abstract 

Energies with higher order non-submodular interactions 
have been shown to be very useful in vision due to their high 
modeling power Optimization of such energies, however, 
is generally NP-hard. One naive approach that works for 
small problem instances is exhaustive search, that is, enu- 
meration of all possible labelings of the underlying graph. 
We propose a general approach for minimizing complex 
high-order energies on large graphs that is based on enu- 
meration of labelings of certain small subsets. 

We show how to benefit from such partial enumerations 
by equivalently reformulating the optimization problem on 
a new graph. Each node of this graph represents a subset 
of pixels with feasible labels (node states) corresponding to 
enumeration of this subset. If pixel subsets are chosen as 
an overlapping partition of the image then our partial enu- 
meration approach simplifies the original high-order energy 
optimization to a pairwise interaction problem that can be 
efficiently solved by methods like TRW-S. We demonstrate 
that our approach often yields the exact global minimum 
(zero duality gap) for some high-order ( curvature) and non- 
submodular (de convolution) energies known to be difficult. 

1. Introduction 

Discrete optimization of MRF models is of fundamen- 
tal importance for may early vision problems. The ability 
to compute optimal or guaranteed near optimal solutions in 
polynomial time makes them suitable for a large class of 
problems. Applications include stereo-matching, segmen- 
tation, superresolution and deconvolution [ , ]. 

There are many approaches to solving these problems. 
When the energy consists of pairwise submodular interac- 
tions it can be minimized exactly in polynomial time using 
max-flow/min-cut algorithms [ ] . 

In case of non-submodular interactions one of the most 
successful methods is roof duality [ ]. The idea is to com- 
pute a lower bound on the optimal energy in polynomial 
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time using graph methods. If the lower bound is not tight 
roof duality provides partial solutions, that is, identifies sub- 
sets of pixels for which the optimal label can be determined. 
To compute a different lower bound Fix et al. [ ] propose 
to discard edges of the original graph until only a tree (or a 
low tree- width graph) remain. The resulting energy can be 
optimized efficiently using dynamic programming. 

Recently researchers have started to consider optimiz- 
ing energies with higher order interactions. A common ap- 
proach for solving these problems are reduction techniques, 
e.g. [ , ], where a higher order clique is replaced by a set 
of pairwise interactions by introducing auxiliary nodes. A 
generalization of roof duality (GRD) to cliques of order 3 
and 4 was presented in [ ]. 

A particular application with 4th order interactions that 
has received a lot of attention recently is segmentation with 
curvature regularization. In [ ] a linear programming for- 
mulation is used to compute tight lower bounds of the en- 
ergy. Shekhovtsov et al. [li ] learns a set of model patches 
with low curvature and then attempts to encourage assign- 
ments of pixels that agree with these patches. The result- 
ing energy roughly penalizes the curvature of the selected 
model patch and quality of fit to the patch, that is, the dif- 
ference between the model patch and the pixel assignments. 

For problems with non-binary labels state-of-the-art 
methods are based on message passing over trees [ , 10]. 
These methods compute a lower bound on the original prob- 
lem by decomposing the energy over subtrees and optimiz- 
ing these independently using message passing. In [ ] the 
dual decomposition scheme is extended to energies with 
higher order interactions. The problem is decomposed into 
sub problems that can be solved using message passing. A 
solution to the original problem is extracted by combining 
the solutions of the subproblems. 

1.1. Contributions and Related Work 

In this paper we demonstrate how to reformulate ener- 
gies (both pairwise and higher order), by enumerating all 
labelings of small subsets of pixels, so that optimization 
becomes easier. We show how it is possible to strengthen 



1 




Figure 1 : Examples of partial enumeration in optimization. Some instances of partial enumeration can be found in prior 
art: 3-rd order active contour model and tiered labeling energy on a grid [ ] are reduced to pairwise energies on a chain 
that can be optimized by dynamic programming (a-b). We propose partial enumeration as a general approach simplifying 
optimization of complex (non-submodular or high-order) energies, e.g. see (c). 



the lower bounds provided by standard algorithms such as 
[ ] using this kind of partial enumeration. Specifically, 
we group pixels into overlapping patches in a sliding win- 
dow fashion, see Fig. 1(c), and enumerate the states and 
costs for each patch. We then equivalently reformulate 
the same energy on a new graph, where each node cor- 
responds to a patch and its labels correspond to the enu- 
merated states. Note that if overlapping patches are large 
enough to cover all cliques in the original energy, then state 
enumeration allows to convert all arbitrarily complex (e.g. 
non-submodular or high-order) interactions into unary po- 
tentials in the reformulated energy. Such a significant sim- 
plication can not come for free. First of all, the label space 
for each node (patch P) of the new graph includes all partial 
labelings . This is much larger than the set of labels C for 
each pixel in the original graph. Moreover, it is necessary 
to introduce consistency constraints for each pair of over- 
lapping patches since the corresponding two partial label- 
ings should agree in the overlap region. The new graph will 
have pairwise interactions and can therefore be optimized 
using standard methods such as TRW-S [ ]. We show that 
this procedure generates stronger bounds than state-of-the- 
art methods do when using the original graph formulation. 
Aditionally it enables solving problems with higher order 
interactions using standard methods such as [8, 10]. 

Related work: Interestingly, some specific examples of 
partial enumeration could be found in prior art. For exam- 
ple, to optimize a snake with bending (3-clique) energy it is 
customary to combine each pair of adjacent control points 
into a single super-node, see Fig. 1(a). The number of states 
for each super-node is m? if m is the number of allowed 
states for each control point. However, the higher-order 
bending energy on the original snake reduces to a pair- wise 



energy on the chain of super-nodes. Despite increase of 
the label space, this problem can be very efficiently solved 
with dynamic programming in O(nm^). A similar reduc- 
tion to dynamic programming is used for MRF optimization 
in [ ] where high-order interactions are due to tiered label- 
ing constraints. This approach can also be seen as special 
case of partial enumeration, even though non-overlapping 
"patches" are sufficient in their case , see Fig. 1(b). 

We present partial enumeration as a general approach 
for simplifying MRF labeling problems using overlapping 
patches to reduce complex interactions to unary potentials. 
Jung et al. [ ] also use patches to simplify complex po- 
tentials. They iterate by randomly sampling patches of 
some small size. Once some patch is selected, they min- 
imize the energy within that patch (using enumeration or 
dynamic programming, if possible) while assignment of all 
other graph nodes is fixed. This could be seen as a patch- 
based extension of ICM algorithm. In contrast, we optimize 
over all patches simultaneously. 

2. Framework 

For simplicity we will start by assuming that our energy 
consists of pairwise interactions on a regular 4-connected 
pixel grid. We represent this grid using a graph Q = (V, E), 
where V are the nodes (pixels) and 8 are the edges. To 
each edge we associate an interaction cost Eij{xi^ Xj). We 
assume that this cost is positive but not necessarily submod- 
ular. Our energy is now 

The space of feasible assignments £, are the binary vectors 
of size n, {0, 1}^ where n is the number of nodes. The 



Figure 2: Left. 4-connected graph. Right: Sub-tree. 

variable xi is the assignment to node i and x G £ is the 
vector containing all the node assignments. 

It is well known that optimization of such energies are 
in general NP-hard. In the special case of an energy de- 
fined on a tree it is possible to compute the global optimum 
in polynomial time using, for example, message passing. 
Therefore researchers have successfully applied tree based 
algorithms for these problems. For example [ , ] decom- 
poses the grid into trees, optimize these and computes a 
lower bound on the original energy. Fix et al. [^] discard 
edges of the original graph until only a tree (or low tree- 
width graph) remains. In this section we will show how 
it is possible to use partial enumeration to strengthen these 
approaches. 

2.1. Inference using "Super Nodes" 

The intuition behind our construction is that by enumer- 
ating local interactions and encoding them into labels in- 
stead of edges it is possible to make the trees that we opti- 
mize over contain more edges from the original graph than 
regular subtrees. Therefore the lower bounds provided by 
the former trees should be stronger than the latter. 

Let us fist consider a subtree T = (V, 8^) of the initial 
graph Q (see figure 2) . Since we have discarded edges we 
have 

^^(x) <^^(x), (2) 

with equality if and only if values of the interactions at the 
discarded edges are all zero. Hence optimizing over this tree 
provides a lower bound on our original energy. The goal of 
our construction is now to strengthen this lower bound. 

The idea is very simple: we will group neighboring 
nodes together into "super nodes" and enumerate all the val- 
ues of the interactions between them. We start by forming 
patches of size 2x2 into super nodes. This is done in a slid- 
ing window fashion, that is, super node (r, c) consists of the 
nodes (r, c), (r, c+ 1), (r + 1, c) and (r + 1, c+ 1), where r 
and c are the row and column coordinates of the pixels. We 
call the set of super nodes V2 x 2 • 

Each super node label can take 16 values corresponding 
to states of the individual pixels. We call the set of possible 
assignments for this graph £2x2- The pairwise interactions 
of the original problem are now transformed to unary po- 
tentials since we enumerate all states within the super node. 



Figure 3: Supernodes are formed in a sliding window fa- 
sion. The red pixels occur in 4 of the new super nodes. 
Pairwise interactions between the edges ensure that shared 
pixels are assigned the same value in all super nodes. 

Note that since patches are overlapping the edges of the 
original graph can be contained in up to two super nodes. 
In order not to change the energy we weight the new unary 
potentials such that the total contribution of the edges is the 
same as in the original graph. There are many ways of do- 
ing this, but for simplicity we give edges that occur twice 
the weight 1/2. 

Finally to ensure that each pixel takes the same value 
in all the super nodes where it is contained we add pairwise 
"consistency" edges £^2x2 between neighboring supernodes 
(see Figure 3). That is, label assignments where neighbor- 
ing nodes assign shared pixels different values are penalized 
with a large value (we use 10^^ in our implementations). 
Note that all super nodes that share pixels do not have a 
consistency edge, it is enough to use a 4-connected neigh- 
borhood structure. 

The energy E'^^ ^ of the new graph ^2 x 2 = ( V2 x 2 , ^2 x 2 ) 
fulfills 

min£;^(x) = min E'^^Jy?'"'^) (3) 

by construction (assuming that the inconsistency penalty is 
chosen large enough). Here x^^^ is the vector containing 
assignments of the super nodes. 

2.2. Lower Bounds and Consistency of Tree- 
Assignments 

The subtrees of ^2 x 2 has a useful property that subtrees 
of Q does not have. Namely, if optimization over a subtree 
of 02 X 2 gives the correct solution then the lower bound will 
also be tight. The reason is that subtrees of ^2 x 2 still con- 
tain all the edges of the original graph within their (super) 
nodes. 

Consider a subtree 72x2 = (^2x2,^2^2) of ^2x2. We 
form 72 X 2 by discarding edges from ^2 x 2 • When solving 



the solution might therefore violate some of the consistency 
constraints of ^2 x 2 • We will say that an assignment x^ ^ ^ 
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Figure 4: Top: A Sub tree 72 x 2 of ^2 x 2 in Figure 3. Bottom: 
72 X 2 contains two copies (red and green edges) of the tree 
in Figure 3. 



is consistent if all super nodes that share a pixel assigns this 
pixel the same value. Since such assignments do not violate 
any of the consistency constraints we have 
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Therefore, if the solution of (4) is consistent, then the lower 
bound given by the tree will be tight 
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The subtrees of ^2x2 also have the advantage that they 
often provide stronger lower bounds than subtrees of Q even 
if the solution is not consistent. Consider for example the 
subtree T of ^ in Figure 2. We can form a similar tree 72 x 2 
from ^2x2, see Figure 4. Looking in the super nodes of 
72 X 2 and considering the consistency edges we see that we 
can find two instances of T within 72 x 2 (see bottom row 
Figure 4) both with weights 1/2 (here the edges that have 
weight 1 are allowed to belong to both trees). Hence if we 
view these two instances as independent and optimize them 
we get the same energy as optimization over T would give. 
In addition there are other edges present in 72x2, therefore 
the lower bound provided by this tree will be at least as good 
as lower bound given by T. 

2.3. Improving the Bound by Increasing Node Size 

In case the lower bound given by 72 x 2 is not tight we 
can try to improve it by making the super nodes bigger and 
thereby making them contain more interactions. The key 
observation is that if the bound is not tight, then there is an 
inconsistency somewhere. By concatenating patches into 
new super nodes we can make sure that the previous incon- 
sistencies do not occur thereby tightening the lower bound. 



Figure 5: The nodes of ^3x3 are created by grouping nodes 
of ^2x2 in groups of 2 x 2 in a sliding window fashion. 
Since consistency is enforced within nodes (e.g. the red 
pixel is the same everywhere) the result is that each node 
will contain a 3 x 3 patch. 
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Figure 6: Tree T3X3. Red patch occurs once, yellow twice 
and blue four times. 



Therefore we group the nodes of ^2x2 into groups of 
2 X 2 in a sliding window fashion. Since neighboring nodes 
have common pixels that are fused into one (see Figure 5) , 
the result is graph 03x3(V3x3, ^sxs)- The edge set is con- 
structed similar to that of ^2 x 2 • Note that even though three 
consecutive nodes will share pixels it is enough to enforce 
consistency using a 4-connected neighborhood. In this con- 
struction all patches except those that are at image borders 
will occur 4 times within the new nodes (see Figure 6). We 
therefore weight the costs so that their total contribution to 
the energy does not change. 

The energy E'^^ 3 represented by the new graph fulfills 

min^3x3(x3x3) = rnin^2^x2(x2x2). (7) 



If the bound obtained from the sub tree Tsxs is not tight 
we can recursively do the same thing again and increase it 
further. In this case we obtain sHding window patches of 
size 4x4. 

Note that in each step of increasing the patch size the 
graph size is reduced (by one row and one column). There- 
fore in the extreme case we will only have one patch and 
will therefore be enumerating all combinations of pixel as- 
signments, so it is clear that the lower bound will approach 
the optimum. 

Remark In the beginning of this section we assumed that 
the energy consists of pairwise interactions. However we 
can do the same thing for higher order interactions. The 
only restriction is that we start with super nodes that con- 
tain all the pixels involved in the interaction. In Section 3 
we use patches of size up to 5 x 5 to encode higher order 
interactions. 

3. Experiments 

In this section we will test our approach of partial enu- 
meration from Section 2. We optimize the resulting graphs 
using TRW-S [ ]. We use different super node size depend- 
ing on the size of the interactions, but we always use a 4- 
connected neighborhood for enforcing consistency. In all 
the tested cases, TRW-S with our graph construction pro- 
vided a solution and a lower bound with the same energy 
(same up to machine precision). 

3.1. Stereo 

In this section we optimize the energies occurring in 
Woodford et al. [ ]. The goal is to find a dense dispar- 
ity map for a given pair of images. The regularization of 
this method penalizes second order smoothness of the dis- 
parity map. The 2nd derivative is estimated from three con- 
secutive disparity values (vertically or horizontally). Thus, 
the resulting optimization problem contains 3rd-order non- 
submodular terms that are challenging to optimize. 

To solve this problem [11] uses fusion moves. In this 
move two proposals are fused together into a new dispar- 
ity map with lower energy. To compute the fusion move 
[17] first reduces the interactions to pairwise (by introduc- 
ing auxiliary nodes) and applies Roof duality (RD) [ ]. 
In contrast we decompose the problem into patches of size 
3x3, that contain entire triple cliques. Note that the in- 
teractions will occur in as many as three supernodes. We 
therefore weight these so that the energy does not change. 
Each super node consists of 9 pixels. We allow all combi- 
nations of these and therefore there are 2^ = 512 labels. 

Table 1 shows the results for the Cones dataset from 
[14] when fusing "SegPln" proposals [ ]. Starting from 
a randomized disparity map we fuse all the proposals. To 



ensure that each subproblem is identical for the two ap- 
proaches, we feed the solution from roof duality into our 
approach before running the next fusion. The proposals on 
average result in 24% unlabeled pixels when using RD. Fur- 
ther more there is a one percent relative duality gap for RD. 
We also tested using the "improve" heuristic [ ] for com- 
pleting the labeling. This gives a slight reduction in duality 
gap. Running probe [ ] instead of improve is not feasible 
due to the large number of unlabeled variables. 

We also compared the final energies when we run the 
methods independent of each other (not feeding solutions 
into our approach), in this case our final solution has 0.82% 
lower energy than that of RD with "improve". 

3.2. Binary Deconvolution 

In this section we test our method on binary deconvolu- 
tion. Figure 7 (a) shows a circle convolved with a 3 x 3 
mean value kernel with additional noise added to it. The 
goal is to recover the original (binary) image. We formu- 
late the energy as outlined in [ ] . Figure 7 (b) shows the 
solution obtained using RD, here the gray pixels are unla- 
beled. For comparison we also plotted the solution obtained 
when solving the the same problem as RD but with TRW-S. 
The solution obtained is quite poor and there is a substan- 
tial duality gap. In contrast (d) shows the solution obtained 
when first forming super nodes with patches of size 3x3 
and then applying TRW-S. In this case there is no duality 
gap, so the obtained solution is optimal. To solve the super 
node problem takes 2.6134 seconds, in contrast the solu- 
tions provided by RD and TRW-S (with the original graph 
formulation) takes 0.0324 and 0.8825 seconds respectively 
to compute. 

3.3. Curvature 

Next we apply our framework to segmentation with cur- 
vature regularization. Throughout this section we will use 
approximations of the energy 

E{S) = / Cint{x)dx^ / CQ^t{x)dx^ / X\K\da, 

Jmt{S) JQxtiS) JdS 

(8) 

where Cint{x) and Cext{x) is the cost of assigning pixel x to 
interior and exterior respectively, is the curvature and A is 
the weight of the regularization. We use absolute curvature 
since it does not have any growing bias, but in principle any 
other version could be used. 

First, we consider a very coarse approximation of cur- 
vature on a 4-connected grid. In this case we only allow 
boundary edges that are either horizontal or vertical and, 
therefore, patches of size 2 x 2 are enough to approximate 
the curvature of such boundaries. Figure 8 shows 4 of the 
16 patch states used to encode curvature (the others are rota- 
tions and inversions of these). Figure 9 compares the results 
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Figure 7: Binary Deconvolution. (a) - Convolved image with added noise, (b) - Solution provided by RD. Gray pixels are 
unlabeled, (c) - Solution obtained by optimizing the pairwise energy (without forming patches) with TRW-S. The energy 
of this solution is 24.0214 and the lower bound provided by TRW-S is -11.7096. (d) - Solution obtained from TRW-S 
when using patches of size 3x3. The energy of this solution is 15.531330 and the lower bound provided by TRW-S is also 
15.531330. 
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Figure 8: Four patch states used for encoding 4-connected 
curvature. The penalties for the patches are 0, tt/ 2, and tt 
respectively. 

of our curvature-based and more standard length-based reg- 
ularizations. Curvature regularization preserves thin elon- 
gated vertical and horizontal structures while suppressing 
the noise. In contrast, length regularization can not preserve 
thin objects and suppress noise at the same time. 

Figure 10 compares our approach to Generalized Roof 
Duality [ ] for different levels of regularization. GRD only 
labels pixels for very moderate levels of regularization. In 
contrast, our approach provides globally optimal solutions 
with no duality gap. 

3.3.1 Better Approximation of Curvature 

For patches of size 2 x 2 it is only possible to encourage 
horizontal and vertical boundaries. For example, along a 
diagonal boundary edge all 2 x 2 patches look like the sec- 



ond patch in Figure 8. To make the model more accurate 
and include directions that are multiples of 45 degrees we 
can increase the size of the patches. For multiples of 45 
degrees it is enough to have 3x3 patches and for 22.5 de- 
grees we use patches of size 5x5. However, the number of 
distinct patch-labels needed to encode the interactions (tran- 
sitions between directions) is quite high. It is not feasible to 
determine their costs by hand (as in 2 x 2 case). 

To compute the specific label costs we generate repre- 
sentative windows of size slightly larger than the patch (for 
3x3 patches we use 5x5 windows) that contain either a 
straight line or a transition between two directions of known 
angle difference, i.e. curvature, see Fig. 1 1 . From this win- 
dow we can determine which super node assignments occur 
in the vicinity of different transitions. We extract all the 
assignments and constrain their sum to be the known curva- 
ture of the window. Furthermore, we require that the cost of 
each label is positive. If a certain label is not present in any 
of the windows we do not allow this assignment. This gives 
us a set of linear equalities and inequalities for which we 
can find a solution (using linear programming). The proce- 
dure gives 122 and 2422 labels for the 3 x 3 and 5x5 cases 
respectively. 

The approach of modeling curvature with patches is 
somewhat related to the idea of ^] (however the optimiza- 
tion is completely different). 



Figure 9: Curvature may potentially preserve long thin structures in contrast to length, (a) - Original image, (b) - Image with 
added noise, (c) - Thresholded solution, (d) 4-connected curvature regularization, (e) Length regularization with weights 
selected as to preserve the thin structure, (f) Length regularization with slightly larger regularization. 
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Figure 10: Our results for 4-connected curvature with different regularization weight A (top row of images). TRW-S with 
super nodes gives a tight lower bound. GRD only labels nodes for low regularization weight (the figures within parenthesis 
are results when using the heuristics proposed for speedup in [ . ]) . 




Figure 1 1 : Seven of the 5 x 5 windows used for computing 
patch costs for 3 X 3 curvature. (The rest are obtained as 
rotations, reflections of the patches and inversions of the 
pixel assignments. 

Figures 12 and 13 show the effects of using the different 
regularizations. In Figure 12 we took an image of a circle 
and segmented it using 3 types of approximating patches. 
Note that there is no noise in the image, so simply truncat- 
ing the data term would give the correct result. We segment 
this image using a high regularization weight (A = 20 in 
all three cases). In (b) horizontal and vertical boundaries 
are favored since these have zero regularization cost. In (b) 
and (c) the number of directions with zero cost is increased 
and therefore the approximation improved with the patch 
size. Computing these segmentations takes 1.46, 8.94 and 
55.63 seconds for (a),(b) and (c) respectively. In the case 
of 5 X 5 patches the enumeration takes a large portion of 
time (in fact, only 14.91 seconds is spent in TRW-S). Fig- 



ure 13 shows segmentations with the different patch sizes. 
To compute these results we used different regularization 
weight (A = 1, 0.2, 0.1 for patch size 2 x 2, 3 x 3 and 5x5 
respectively). The same effects are visible here. The com- 
putation times are 27.74, 36.91 seconds for patch size 2x2 
and 3x3 respectively. The 5x5 version seems to be much 
more difficult to solve and requires almost 30 minutes of 
computation. (Out of this 15 minutes is enumeration.) 
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